Are we getting interactions wrong?
The role of link functions
in psychological research

Laura Sità, Margherita Calderan, Tommaso Feraco,
Filippo Gambarota, Enrico Toffalini

per chi vuole provare a simulare le cose in tempo reale

qr code che manda a questo link https://github.com/sitalaura/link-functions/tree/main/R

oppure scaricare il file a questo percorso sitalaura.github.io/link-functions/R/datasim.R

1 Example

Simulated dataset 1

independent variable: age in years (years)

dependent variable: (variabile)

aggiungi screenshot dataset

Linear model

using the classical linear predictor

fitL = glm(y~age, data=d)

Linear model

what we dont see it bc its a default parameter but its actually hidden in our code:

Code
fitL_explicit = glm(y~age, family=gaussian(link="identity"), data=d)

the model uses family gaussian and the identity link function

link function in GLMs transforms (re-map) the linear predictor X

to the appropriate range of the response variable Y

2 Example

Simulated dataset 2

independent variable: age in years (years)

dependent variable: mistakes in a reading task (errors)

aggiungi screenshot dataset

Linear model

using the classical linear predictor

fitL = glm(y~age, data=d)

Linear model

Code
fitL = glm(y~age, data=d)
effL = data.frame(allEffects(fitL,xlevels=list(age=seq(min(age),max(age),.05)))$"age")
ggplot(d,aes(x=age,y=y))+
  coord_cartesian(ylim=c(-0.4,max(y))) +
  geom_point(size=4,alpha=.5,color="darkblue")+
  geom_line(data=effL,aes(y=fit),size=2,color="darkred")+
  geom_ribbon(data=effL,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="darkred")+
  theme(text=element_text(size=ts,color="black"))+
  scale_y_continuous(breaks=seq(0,20,2))+scale_x_continuous(breaks=seq(0,20,.5))+
  ylab("Errors")+xlab("Age (years)")

✅ Appropriate model

using the appropriate distribution family=poisson

and the appropriate link

fitP = glm(y~age, family=poisson(link="log"), data=d)

in this case, link="log" makes sure that y spans from 0 and +inf

✅ Appropriate model

Code
fitP = glm(y~age, family=poisson(link="log"), data=d)

effP = data.frame(allEffects(fitP,xlevels=list(age=seq(min(age),max(age),.05)))$"age")
p1 <- ggplot(d,aes(x=age,y=y))+
  coord_cartesian(ylim=c(-0.4,max(y))) +
  geom_point(size=4,alpha=.5,color="darkblue")+
  geom_line(data=effP,aes(y=fit),size=2,color="blue")+
  geom_ribbon(data=effP,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="blue")+
  theme(text=element_text(size=ts,color="black"))+
  scale_y_continuous(breaks=seq(0,20,2))+scale_x_continuous(breaks=seq(0,20,.5))+
  ylab("Errors")+xlab("Age (years)")

p1 

✅ Appropriate model (blue one)

Code
p2 <- ggplot(d,aes(x=age,y=y))+
  coord_cartesian(ylim=c(-0.4,max(y))) +
  geom_point(size=4,alpha=.5,color="darkblue")+
  geom_line(data=effP,aes(y=fit),size=2,color="blue")+
  geom_ribbon(data=effP,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="blue")+
  geom_line(data=effL,aes(y=fit),size=2,color="darkred")+
  geom_ribbon(data=effL,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="darkred")+
  theme(text=element_text(size=ts,color="black"))+
  scale_y_continuous(breaks=seq(0,20,2))+scale_x_continuous(breaks=seq(0,20,.5))+
  ylab("Errors")+xlab("Age (years)")

p1 | p2

3 Adding an interaction

Simulated dataset 2

independent variable: age in years (years)

dependent variable: mistakes in a reading task (errors)

adding a new main effect

groups: normal kids (group = 0) vs kids with dyslexia (group = 1)

Huge false positive interactions rate

✅ correct link="log"

→ false positive interactions are 3.5% (about ok)

Huge false positive interactions rate

❌ incorrect link="identity"

→ false positive interactions are 76.9% (baaad!)

Conclusions

there are many link functions (and families of distributions)

use the right one: otherwise it’s very likely you end up finding something that is not there!

We’re conducting a systematic review concerning how often the wrong link functions are used in psychological research + they lead to finding a significant interaction: so far, quite often

Materials & Contact

All materials are available on GitHub at sitalaura/link-functions

Questions and feedbacks laura.sita@studenti.unipd.it

Bibliography

Domingue, B. W., Kanopka, K., Trejo, S., Rhemtulla, M., & Tucker-Drob, E. M. (2024). Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychological methods, 29(6), 1164.

Hardwicke, T. E., Thibault, R. T., Clarke, B., Moodie, N., Crüwell, S., Schiavone, S. R., Handcock, S. A., Nghiem, K. A., Mody, F., Eerola, T., et al. (2024). Prevalence of transparent research practices in psychology: A cross-sectional study of empirical articles published in 2022. Advances in Methods and Practices in Psychological Science, 7 (4), 25152459241283477.

Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1), 156.

Supplementary materials

wrong family doesnt imply issues

if we put the wrong family, but the correct link family=gaussian(link="log") still no interaction (as it should)

fitL_log_0 <- glmmTMB(y ~ age + group + (1|id), family = gaussian(link="log"), data=d, start=list(beta=c(b0, 0, 0)))

fitL_log_1 <- glmmTMB(y ~ age * group + (1|id), family = gaussian(link="log"), data=d, start=list(beta=c(b0, 0, 0, 0)))

anova(fitL_log_0, fitL_log_1)
Data: d
Models:
fitL_log_0: y ~ age + group + (1 | id), zi=~0, disp=~1
fitL_log_1: y ~ age * group + (1 | id), zi=~0, disp=~1
           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
fitL_log_0  5 1228.8 1246.4 -609.41   1218.8                         
fitL_log_1  6 1230.3 1251.5 -609.16   1218.3 0.4946      1     0.4819